Tutorial on reading in data and creating some basic maps in the simple features package. The first step here is to load in some packages.
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
library(mapview)
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
##
## select
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# set directory from which we read in data
ddir <- "C:/Users/Yixin Sun/Documents/Dropbox/Recruiting/2018 Recruiting/epic_orientation/data/Spatial_Exercise"
# function that creates custom map style for ggplot
theme_nothing <- function(base_size = 12, title_size = 15){
return(
theme(panel.grid.major = element_line(colour = "transparent"),
panel.grid.minor = element_line(colour = "transparent"),
panel.background = element_blank(), axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(size = title_size, hjust = 0.5),
plot.caption = element_text(size = 6)))
}
sf ObjectOur exapmle will stem from a simple dataset on crimes in the UK. Often times with point objects, the data comes in a table of latitude and longitude, which we coerce into a spatial object. The important thing here is to know which coordinate reference system we are using.
utm_crs <- 27700
# load data in and get rid of spaces in column names
streetCrime <-
read_csv(file.path(ddir, "2012-03-cumbria-street.csv")) %>%
rename(Report_by = `Reported by`,
Falls_within = `Falls within`,
Crime_type = `Crime type`)
## Parsed with column specification:
## cols(
## Month = col_character(),
## `Reported by` = col_character(),
## `Falls within` = col_character(),
## Easting = col_integer(),
## Northing = col_integer(),
## Location = col_character(),
## `Crime type` = col_character(),
## Context = col_character()
## )
# take a look at the data as a dataframe
glimpse(streetCrime)
## Observations: 4,350
## Variables: 8
## $ Month <chr> "2012-03", "2012-03", "2012-03", "2012-03", "2012...
## $ Report_by <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Falls_within <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Easting <int> 351534, 299210, 351183, 301277, 299416, 361096, 3...
## $ Northing <int> 492353, 528515, 492741, 524712, 527779, 478662, 5...
## $ Location <chr> "On or near Dowker'S Lane", "On or near Supermark...
## $ Crime_type <chr> "Burglary", "Burglary", "Burglary", "Burglary", "...
## $ Context <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
# convert from dataframe to sf object
streetCrime <-
streetCrime %>%
st_as_sf(coords = c("Easting", "Northing"), crs = utm_crs)
# take a look at the data as an sf object
glimpse(streetCrime)
## Observations: 4,350
## Variables: 7
## $ Month <chr> "2012-03", "2012-03", "2012-03", "2012-03", "2012...
## $ Report_by <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Falls_within <chr> "Cumbria Constabulary", "Cumbria Constabulary", "...
## $ Location <chr> "On or near Dowker'S Lane", "On or near Supermark...
## $ Crime_type <chr> "Burglary", "Burglary", "Burglary", "Burglary", "...
## $ Context <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ geometry <POINT [m]> POINT (351534 492353), POINT (299210 528515...
There are several ways to visualize data - base R, mapview, or ggplot2. mapview is great for quickly creating interactive visualizations of your spatial data, and is therefore a convenient way to investigate the geometries (for example, visualizing using mapview is a good way to visualize if you have the right coordinate reference system).
plot(streetCrime$geometry)
# mapview has several types of maps the data can be overlaid on
# check which type of map you'd like here http://leaflet-extras.github.io/leaflet-providers/preview/
mapview(streetCrime, map.types = "OpenStreetMap.DE")
streetCrime %>%
ggplot() +
geom_sf(aes(colour = Crime_type)) +
ggtitle("Street Crime By Type") +
theme_nothing()
UK Crime is organized by neighborhoods. Here, we read in the shapefile of the neighborhoods and aggregate the point data to the neighborhood level. Important to note that the neighborhood polygons are in longitude/latitude while the crime data is in UTM, so we must first convert the data to the same CRS
# read in neighbourhoods shapefile
nh <-
file.path(ddir, "neighbourhoods.shp") %>%
st_read(stringsAsFactors = F) %>%
st_transform(utm_crs)
## Reading layer `neighbourhoods' from data source `C:\Users\Yixin Sun\Documents\Dropbox\Recruiting\2018 Recruiting\epic_orientation\data\Spatial_Exercise\neighbourhoods.shp' using driver `ESRI Shapefile'
## Simple feature collection with 46 features and 3 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -3.64057 ymin: 54.03964 xmax: -2.159025 ymax: 55.18898
## epsg (SRID): 4326
## proj4string: +proj=longlat +ellps=WGS84 +no_defs
# plot the polygons
nh %>%
ggplot() +
geom_sf() +
geom_sf(data = streetCrime) +
ggtitle("Neighbourhood Polygons") +
theme_nothing()
# Join crime data to neighbourhood polyogns
# aggregate total crime in each neighbourhood
nh_crime <-
nh %>%
st_join(streetCrime) %>%
group_by(ID, Name, Population) %>%
summarise(total_crimes = n()) %>%
mutate(crime_per_capita = total_crimes / Population)
# plot chloropleth map - ARRANGE 2 PLOTS SIDE BY SIDE FOR LEVELS AND THEN PER
per_capita_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = crime_per_capita)) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Crime in Per Capita")
tot_crime_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = total_crimes)) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Crime in Neighbourhoods")
# use gridExtra package to print maps side by side
grid.arrange(tot_crime_map, per_capita_map, ncol = 2)
You’ll notice that most of the county is pretty light colored - this is because the distribution is very skew. Try taking a log-transform and mapping that.
log_per_capita_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = log(crime_per_capita))) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Log Crime in Neighbourhoods per Capita")
log_tot_crime_map <-
nh_crime %>%
ggplot() +
geom_sf(aes(fill = log(total_crimes))) +
theme_nothing() +
scale_fill_continuous(low = "#fee0d2", high = "#67000d") +
ggtitle("Log Crime in Neighbourhoods")
# use gridExtra package to print maps side by side
grid.arrange(log_tot_crime_map, log_per_capita_map, ncol = 2)
Some neighbourhoods are clearly more rural than others. Therefore, it might be useful to visualize the land types the analysis region. Here, we learn how to read in and extract information from a raster file.
Raster files store information for each pixel. In this case, we have a land use classification at each pixel level. What we want to do is for each neighbourhood polygon, count up the number of pixels in that polygon that fall within each land use classification.
To do this we use 3 files * raster file, for which each pixel is classified using a number, * raster legend file which is a dictionary mapping the numbers to a type of land use classification * neighbourhood polygon file which we want to overlay onto the raster file and count up the nubmer of pixels in that polygon that for each classification
# read in legend for land use raster file
legend <-
file.path(ddir, "legend.csv") %>%
read_csv()
## Parsed with column specification:
## cols(
## GRID_CODE = col_integer(),
## LABEL1 = col_character(),
## LABEL2 = col_character(),
## LABEL3 = col_character(),
## Colour = col_character()
## )
head(legend)
# read in raster file
landUse <- raster(file.path(ddir, "cumbriaLandUse.tif"))
plot(landUse)
# the extract() function from the raster package extract values of the pixel of a raster object that is covered by a polygon. Let's look at this with 1 polygon
# note that extract currently only works with `sp` objects, so we first
# transform nh to an `sp` object
nh_sp <- as(nh, "Spatial")
extract(landUse, nh_sp[1,])
## Warning in .local(x, y, ...): Transforming SpatialPolygons to the CRS of
## the Raster
## [[1]]
## [1] 18 18 18 18 18 39 18 18 18 39 18 18 3 3 39 18 18
## [18] 3 3 3 3 18 18 3 3 3 3 3 3 3 3 3 3 3
## [35] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [52] 3 3 3 3 3 2 2 3 3 3 3 3 3 2 2 2 2
## [69] 3 3 3 3 3 3 2 2 2 1 2 2 3 3 3 3 3
## [86] 2 2 2 1 1 1 2 2 3 3 3 3 3 2 2 1 1
## [103] 1 1 2 2 3 3 3 2 2 2 1 1 1 1 1 2 2
## [120] 3 3 3 2 2 2 1 1 1 1 1 1 2 3 3 3 2
## [137] 2 2 2 1 1 1 1 1 2 2 3 3 3 3 2 2 2
## [154] 1 1 1 1 1 1 2 2 3 3 3 3 2 2 2 1 1
## [171] 1 1 1 1 1 1 3 3 3 3 3 2 2 1 1 1 1
## [188] 1 1 1 1 2 3 3 3 3 3 3 1 1 1 1 1 1
## [205] 1 1 1 2 2 3 3 5 5 5 5 1 1 1 1 1 1
## [222] 1 1 1 1 2 3 5 5 5 5 5 5 5 1 1 1 1
## [239] 1 1 1 1 1 1 3 3 3 3 5 5 5 5 5 5 5
## [256] 1 1 1 1 1 1 1 1 2 255 3 3 3 3 5 5 5
## [273] 5 5 5 5 1 1 1 5 5 1 2 2 255 3 3 3 3
## [290] 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [307] 255 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
## [324] 5 5 5 5 5 5 0 3 3 3 3 3 3 3 3 5 5
## [341] 5 5 5 5 5 5 5 5 5 5 5 5 5 0 0 0 255
## [358] 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
## [375] 5 5 0 0 0 0 0 0 255 3 3 3 3 3 3 3 3
## [392] 3 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0
## [409] 0 255 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5
## [426] 0 0 0 0 0 0 0 0 0 255 3 3 3 3 3 3 3
## [443] 5 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0
## [460] 0 0 255 3 3 3 3 3 3 5 5 5 5 5 5 5 5
## [477] 5 5 0 0 0 0 0 0 0 0 255 5 5 3 3 5 5
## [494] 5 5 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0
## [511] 255 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0
## [528] 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5
## [545] 5 5 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5
## [562] 5 0 0 5 5 5 5 5 5 5 5 5 0 0 5 5 5
## [579] 5 5 5 5 0 255 5 5 5 5 5 5 5 5 5 5 5
## [596] 5 5 39
# function for counting up land use classification in a polygon
# then return the classification that had the highest area coverage
lc_extract <- function(p){
lc_temp <-
as_tibble(p) %>%
filter(value != 0) %>%
left_join(legend, by = c("value" = "GRID_CODE")) %>%
group_by(LABEL2) %>%
summarise(n = n()) %>%
arrange(-n) %>%
filter(row_number() == 1) %>%
dplyr::select(Classification = LABEL2) %>%
ungroup
return(lc_temp)
}
# extracting highest classification type for each polygon
library(purrr) # use the functional programming package instead of apply family
lu_crime <-
extract(landUse, nh_sp) %>%
map_df(lc_extract) %>%
bind_cols(nh_crime, .)
## Warning in .local(x, y, ...): Transforming SpatialPolygons to the CRS of
## the Raster
# map land use
lu_crime %>%
ggplot() +
geom_sf(aes(fill = Classification)) +
theme_nothing() +
ggtitle("Land Use of Neighbourhoods")
lu_crime %>%
ggplot() +
geom_sf(aes(fill = Classification)) +
theme_nothing() +
ggtitle("Land Use and Crime") +
geom_sf(data = streetCrime, shape = 3) +
theme(legend.position = "none")
Here we are interested in whether crimes are close to major roads, or if there are specific crimes that usually happen near roads. We load in road data, and find distance of crimes to roads.
roads <-
file.path(ddir, "mainroads.shp") %>%
st_read(stringsAsFactors = F) %>%
st_transform(utm_crs)
## Reading layer `mainroads' from data source `C:\Users\Yixin Sun\Documents\Dropbox\Recruiting\2018 Recruiting\epic_orientation\data\Spatial_Exercise\mainroads.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3486 features and 3 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -3.602487 ymin: 54.051 xmax: -2.00917 ymax: 55.13869
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
table(roads$type)
##
## motorway motorway_link primary primary_link road
## 198 75 610 9 193
## secondary secondary_link tertiary tertiary_link trunk
## 629 3 945 11 757
## trunk_link
## 56
# keep just major roads
roads <-
roads %>%
filter(type %in% c("motorway", "motorway_link", "trunk", "trunk_link",
"primary", "primary_link"))
# Get distance to closest main road
streetCrime$dRoad <-
st_distance(streetCrime, roads) %>%
apply(1, min)
streetCrime %>%
ggplot(aes(x = Crime_type, y = dRoad))
We also have a data file of the crime counts in neighbourhoods for a number of month. The file has one for each month for each neighbourhood, and lists the number of crimes in each category. Here, we first join this with the neighbourhoods polygons and create some time series plots
allCrime <-
file.path(ddir, "allCrime.csv") %>%
read_csv
## Parsed with column specification:
## cols(
## Month = col_character(),
## Force = col_character(),
## Neighbourhood = col_character(),
## `All crime and ASB` = col_integer(),
## Burglary = col_integer(),
## `Anti-social behaviour` = col_integer(),
## Robbery = col_integer(),
## `Vehicle crime` = col_integer(),
## `Violent crime` = col_integer(),
## `Public Disorder and Weapons` = col_integer(),
## Shoplifting = col_integer(),
## `Criminal damage and Arson` = col_integer(),
## `Other Theft` = col_integer(),
## Drugs = col_integer(),
## `Other crime` = col_integer()
## )
glimpse(allCrime)
## Observations: 322
## Variables: 15
## $ Month <chr> "2011-09", "2011-09", "2011-09",...
## $ Force <chr> "Cumbria Constabulary", "Cumbria...
## $ Neighbourhood <chr> "GARS07", "GARS06", "GARS05", "G...
## $ `All crime and ASB` <int> 41, 141, 56, 82, 119, 90, 360, 1...
## $ Burglary <int> 3, 3, 0, 2, 4, 2, 8, 2, 7, 1, 3,...
## $ `Anti-social behaviour` <int> 18, 39, 36, 50, 56, 53, 194, 3, ...
## $ Robbery <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vehicle crime` <int> 3, 2, 1, 1, 7, 1, 3, 0, 3, 2, 0,...
## $ `Violent crime` <int> 2, 16, 11, 11, 19, 10, 48, 3, 12...
## $ `Public Disorder and Weapons` <int> 1, 2, 0, 3, 2, 2, 17, 0, 1, 1, 1...
## $ Shoplifting <int> 2, 7, 0, 1, 3, 2, 25, 0, 5, 1, 0...
## $ `Criminal damage and Arson` <int> 2, 54, 4, 9, 12, 11, 27, 3, 14, ...
## $ `Other Theft` <int> 7, 11, 4, 4, 8, 4, 24, 0, 7, 7, ...
## $ Drugs <int> 1, 4, 0, 1, 5, 1, 7, 0, 2, 0, 3,...
## $ `Other crime` <int> 2, 3, 0, 0, 3, 4, 7, 0, 1, 2, 0,...
# right now month is in character format - coerce this into better format
# use lubridate package
allCrime <- mutate(allCrime, Month = ymd(paste0(Month, "-01")))
glimpse(allCrime)
## Observations: 322
## Variables: 15
## $ Month <date> 2011-09-01, 2011-09-01, 2011-09...
## $ Force <chr> "Cumbria Constabulary", "Cumbria...
## $ Neighbourhood <chr> "GARS07", "GARS06", "GARS05", "G...
## $ `All crime and ASB` <int> 41, 141, 56, 82, 119, 90, 360, 1...
## $ Burglary <int> 3, 3, 0, 2, 4, 2, 8, 2, 7, 1, 3,...
## $ `Anti-social behaviour` <int> 18, 39, 36, 50, 56, 53, 194, 3, ...
## $ Robbery <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ `Vehicle crime` <int> 3, 2, 1, 1, 7, 1, 3, 0, 3, 2, 0,...
## $ `Violent crime` <int> 2, 16, 11, 11, 19, 10, 48, 3, 12...
## $ `Public Disorder and Weapons` <int> 1, 2, 0, 3, 2, 2, 17, 0, 1, 1, 1...
## $ Shoplifting <int> 2, 7, 0, 1, 3, 2, 25, 0, 5, 1, 0...
## $ `Criminal damage and Arson` <int> 2, 54, 4, 9, 12, 11, 27, 3, 14, ...
## $ `Other Theft` <int> 7, 11, 4, 4, 8, 4, 24, 0, 7, 7, ...
## $ Drugs <int> 1, 4, 0, 1, 5, 1, 7, 0, 2, 0, 3,...
## $ `Other crime` <int> 2, 3, 0, 0, 3, 4, 7, 0, 1, 2, 0,...
# join with neighbourhoods too create a neigbourhoods-month panel
# note that a left_join preserves the class of the first argument, in this case
# it results in an sf dataframe
nh_all <-
nh %>%
left_join(allCrime, by = c(ID = "Neighbourhood"))
# make sure neighbourhood-month is unique
nh_all %>%
group_by(ID, Month) %>%
filter(n() > 1)
## Warning in `[<-.data.frame`(`*tmp*`, is_list, value = list(`18` = "<S3:
## sfc_GEOMETRY>")): replacement element 1 has 1 row to replace 0 rows
# plot just the first 4 months of all crimes
# FORMAT DATES INTO SEPTEMBER 2011 ETC
# FIGURE OUT COLOR GRADIENTs
nh_all %>%
filter(Month <= ymd(20111201)) %>%
ggplot(aes(fill = `All crime and ASB`)) +
geom_sf() +
facet_wrap(~ Month) +
theme_nothing() +
scale_fill_gradient(low= "white", high = "red") +
theme_nothing()
MAYBE JUST HAVE THEM USE A SEPARATE DATASET WITH SIMILAR TASKS? * Chloropleth map of all the violent crimes in each neighbourhood using all the data from allCrime * Use two methods, mapping and line graph, to visualize how crime patterns have changed over time for just neighbourhoods GARZ01, GARZ02, GARZ03, GARZ04, and GARZ05 * Repeat exercise with roads, except this time, using bodies of water. The water shapefile path should be file.path(ddir, "naturalwater.shp").